As a group of scientists, our team is curious about how humans can affect the natural environment. We wondered if weather conditions in a highly populated region might correlate with notable human activities on a global or local scale.
There is a variety of literature available on global climate change, suggesting that the Earth’s surface temperature is increasing, on average, over time in a way that threatens many lifeforms on our planet. This warming is primarily due to human activity: the combustion of fossil fuels resulting in the emission of carbon dioxide into the atmosphere. We decided to consider local weather trends in a specific urban area to see what we might learn.
After some search, we discovered that historical daily weather records from locations around the United States are made publicly availably by the National Oceanic and Atmospheric Administrative via their website: https://www.ncdc.noaa.gov/cdo-web. We found that a data station in the middle of Central Park in New York City has made more than 56,000 daily weather observations dating back to 1869. The variables observed included daily maximum temperature in degrees Fahrenheit (TMAX), daily minimum temperature in degrees Fahrenheit (TMIN), and daily precipitation (PRCP) and snowfall (SNOW) in inches. We decided to analyze these data to see what trends we might uncover. Our data span the days from February 28, 1869, to September 26, 2022.
Temperature data from Central Park have been studied in the past, and warming trends were observed; we wanted to see what we might discover on our own, from analysis of the original data.
Given that Central Park is an oasis in the middle of a city environment, we wondered whether any weather trends correlate with major human activities– both globally and locally. To explore these possibilities, we formulated the following preliminary questions:
To dig in, we began with temperature.
We started by understanding the distributions of our temperature data to rule out any anomalies or unusual characteristics.
The distributions of both TMAX and TMIN do not appear to be normal at first glance, with TMAX being slightly skewed to the left and TMIN having two peaks. As expected, the distribution of TMAX includes temperature observations that are greater that those of TMIN, with a significant overlap. That overlap can be attributed to seasonality within the data. Because we have a large number of observations, we can assume normality for the sake of our statistical testing moving forward.
The understand the effect of seasonality on the temperature data, we calculated the average daily temperatures by month. As we can see here, there is a trend of rising temperatures when we move from winter to summer months. This is to be expected since summer will see higher average daily maximum and minimum temperatures than winter. Seasonality is an aspect of our data that we might need to account for in our testing, as temperatures ranges clearly differ between summer and winter months.
Now that we have insight into the distribution of our variables, it was time to start assessing whether the data confirmed any significant changes in daily temperatures over the time. Specifically, we want to test whether changes in average daily maximum temperatures coincide with the documented rise in global temperatures over the last century. A report on the global change in Earth’s temperature and climate is available via the National Oceanic And Atmospheric Administration https://www.ncdc.noaa.gov/sotc/global/202113.
We grouped the daily temperature observations into 3 distinct eras in
American history. The eras are described as:
1. Industrial Era: 1900-1940
2. Cold War Era: 1940-1980
3. Modern Era: 1980-Present
This distinction was important because we could now compare temperature observations across three independent groups, and each era contains roughly 15,000 daily observations giving us statistically powerful sample sizes to work off of.
The calculated average daily maximum temperature of each era suggests there’s a slight difference across the time periods. To verify the statistical significance of these differences, we ran an ANOVA test comparing the three eras.
The test netted a p-value much lower than our threshold of 0.05, verifying that the statistical differences between the average daily maximum temperature of the 3 eras is significant.
That’s good news, but just how difference are they from each other? For that answer, we can turn to Tukey’s HSD (honestly significant difference) test. This test helps us compare the differences between all possible pairs of our groupings and evaluates their significance.
The plot above displays the estimated difference between the means of each dual-era comparison along with a 95% confidence interval for that difference. The largest estimated difference in mean is between the Industrial and Modern eras. The adjusted p-value for each 2-way comparison was practically 0, indicating that the differences in the average daily maximum temperature between the eras is statistically significant.
Through this analysis, we are able to safely reject the hypothesis that the average daily maximum temperature remains the same across each era. There is significant evidence to suggest that our TMAX variable is increasing over time. We can utilize linear regressions to dive further into this relationship.
We then created a linear model of TMAX vs. year to understand the temperature trends since 1900. The fit parameters were statistically significant, and suggested that both the maximum and minimum daily temperatures in New York’s Central Park have increased on average over time at a rate of approximately 0.026 degrees per year. While the p-value for this parameter is < 2e-16 (well below the threshold of alpha = 0.05), this overall fit is poor, with an adjusted r-squared value of 0.00245.
The poor fit is likely due to the wide range of daily temperatures that occur in a given year as a result of seasonal variation. The following plot of daily maximum temperatures shows the wide variance of the data around the linear model.
In order to improve the fit and model temperature trends more completely, we decided to account for seasonal variation by also including month as a categorical regressor. The resulting fit has an r-squared value of 0.775 and a slope of 0.025 degrees Fahrenheit per year, with all fit parameters’ p-values well below 0.05. The different intercepts for the each level of the categorical variable (the twelve months of the year) indicate that January is the coldest and July the hottest month in Central Park, with an average difference in maximum daily temperature of approximately 46 degrees Fahrenheit in any given year over this window.
These two extremes and their linear models are plotted in the following figure; it is clear that the multiple regression is a much better model of temperature trends, consistent with the higher r-squared value.
To create an even better model, we’d need to use sinusoidal functions that capture the cyclical variation of weather with season; this would take us into scientific data analysis and time series modeling and is a topic for future consideration.
We then wanted to use the linear models developed for the Central Park weather data and compare them to the global trends that have been observed over the similar time period. The ground truth rate of temperature change was found on climate.gov which states, “Earth’s temperature has risen by 0.14° Fahrenheit (0.08° Celsius) per decade since 1880, but the rate of warming since 1981 is more than twice that: 0.32° F (0.18° C) per decade.” To appropriately compare the Central Park and Global temperature trends over time. New linear models were built for the Central Park data to align to the time periods from climate.gov.
The first model observed the maximum daily temperatures from 1880-2022, shown in the figure above. Based on what we previously learned, this model fits TMAX vs year and includes month as a categorical regressor to account seasonal variation and improve the fit of the model. This resulted in a model fit with an r-squared value of 0.775 and a slope of 0.3° Fahrenheit per decade, with a p-value that is well below 0.05. The 95% confidence interval for this model’s slope is 0.28°-0.32° Fahrenheit per decade. The recorded global rate of temperature change, 0.14° Fahrenheit, is outside of this confidence interval. Thus, we can conclude that the rate of temperature change in Central Park is statistically greater than the global rate of change from 1880-2022.
The second model observed the maximum daily temperatures for the more recent time period of 1981-2022. This model similarly fits TMAX vs year and includes month as a categorical regressor to account seasonal variation and improve the fit of the model. However, in this model there was no statistically significant correlation between TMAX and year. The model resulted in a good fit with an r-squared value of 0.764 with a slope of 0.1° Fahrenheit per decade - but the p-value of this slope was greater than 0.05. This indicates that more data may be required to make a meaningful model of temperature changes in NYC since 1981.
We wondered whether these trends were true for other locations in the New York City area. To assess this, we found data from another NOAA station at JFK International Airport. Because these data date only as far back as the Airport (which was built in 1948), we focused on 1948 on, computing linear models for both regions for this time window.
We were surprised to notice that the slope of the Central Park model for 1948 on was lower than that including observations from 1900 on; only 0.014° F (r-squared = 0.771, all p << 0.05) per year compared to 0.025° F (r-squared = 0.773, all p << 0.05). This suggests that average Central Park warming was greater in the first half of the 20th century than in the second half– which is not what we would intuit based on the understanding that the global rate of warming is increasing.
We also found a higher warming rate at the JFK airport site, of approximately 0.033° Fahrenheit per year.
To see whether the different warming rates in Central Park post-1900 and post-1948, and at JFK airport are real, we examined the 95 percent confidence intervals associated with each of the three slopes.
The confidence intervals for the slope of the three models does not
overlap, suggesting that the warming rates are substantially different
in the three models. This suggests that:
1. Temperature trends in Central Park are not strictly linear between
1900 and 2022.
2. The rate of warming at JFK airport is actually greater than that in
Central Park between 1948 and 2022.
We hypothesize that these trends could be related to rate of development in the areas in question, as concrete can hold more heat than a non-built environment. To test this hypothesis, we could look for other data sets for these locations over the same time period that include some measure of construction.
As average temperatures on Earth rise, more evaporation occurs which increases overall precipitation. To gauge the effect of this on the local level, we sought to analyze precipitation outliers in our dataset and determine if a significant change has occurred in rainfall over time.
Outliers are defined as values that lie significantly far from other points in the data. Because there were a lot of days with 0 inches of precipitation which skewed our distribution, we decided to remove them before calculating our outliers.
Looking at the yearly data, we identified a discernible trend of increased variation in the amount of total rainfall within a year starting after 1970. The top 5 years based on total precipitation all reside in the last half century of the collected data.
Zooming out, we can see a relationship between the amount of days with precipitation outlier values and the total precipitation amount occurring over the course of a decade. We notice an upwards trend in both values starting in the 1970s, supporting the visual trends analyzed in the yearly totals plot above. This coincides with the patterns of increasing average daily temperature we saw in our prior analysis. Although not part of this report, we could further investigate the correlation between temperature and precipitation to determine if the data signals a statistically significant pattern.
Another indicator of warming trends, especially for a location such as New York City that experiences all 4 seasons, is the amount of snowfall occurring in a given year. Unfortunately, there was a sizable amount of missing snowfall observations in our dataset which made performing this analysis difficult. What we did have was consistent values on snowfall over the Christmas holiday (Christmas Eve and Christmas Day).
Since 1900, it has snowed on 24 out of 121 Christmas holidays (~20%). We witnessed a trend of reducing snowfall events over the Christmas holiday beginning around 1980. Between 1980 and 2021, there have been only 4 snowfall events during the Christmas holiday (~10%). If we are able to acquire the necessary data, we can further test the significance of our observations to determine if a statistical correlation exists between snowfall and temperature.
After seeing the greater variability in precipitation more recently, we plotted total precipitation and total snowfall and fit these variables to linear models to observe the trends over time, shown in the figure below.
There is no clear trend in the linear model fit in total yearly Central Park precipitation since 1900. The model has a very weak correlation over time, with an r-squared value of 0.09, showing an increase in total precipitation over time at a rate of 0.75 inches per decade, with a p-value less than 0.05. However, given the poor fit to the data, this is likely a result of more higher leverage data points in more recent times that have artificially increased the overall trend. Our earlier observation of increased variability in yearly precipitation over time supports this explanation.
For total yearly snowfall in Central Park since 1900, there was no statistically significant trend observed in the data. This indicates that there is no change in total yearly snowfall over time. This is an interesting contrast to the rising temperature trends that were observed and warrants future exploration.
As we move forward with these data, we will further explore relationships between weather variables. For a preview of how this might look, we created a simple correlation plot of all numeric variables for Central Park since 1900.
Preliminarily, it appears that the slight correlation between year and maximum and minimum daily temperatures is reflected here, and that snow has an inverse correlation with the temperature variables. Bringing month back in as a categorical variable might be interesting here. We also note that there seems to be an inverse correlation between average daily temperature and year. However, the plot is only comparing pairwise relationships for which there are data– and not every day has data for average daily temperature, so these data may not be appropriately representative.
The last question our team wanted to address was to understand the changes in weather patterns that may be associated with the COVID-19 lockdown. The COVID-19 lockdown had major social, economic, and political impacts in 2020. The lockdown in New York City in Spring of 2020 was one of the earliest in effect and saw unprecedented traffic and life-pattern changes to those who visited and worked in NYC daily.Our team set out to see if these major changes to the city were noticeable in the weather patterns at the time.
To do this, the average daily maximum temperatures for spring (considered April and May) and summer (considered June, July and August) were compared using the years leading up to the pandemic and the months following the lockdown order (Figure C4).
A t-test was performed to compare the means of the pre-lockdown and post-lockdown maximum daily temperatures (Table C2). For the summer lockdown months, the summer following lockdown appeared to be warmer on average. However, this was not a statistically significant difference in the average maximum temperatures.
The spring lockdown months showed a statistically significant difference in the means pre- and post- lockdown order. The post-lockdown months were significantly cooler, with a mean spring temperature of 63.5°F, than the years preceding the COVID-19 pandemic, which had a mean spring temperature of 67.3°F. This is an interesting finding as similar studies found a decrease in the day Land Surface Temperature during COVID-19 lockdown (Parida et al, 2021). The authors attribute the change in temperature to the change in aerosols in the air. The contrast in results warrants further study to explore this trend.
If we have time, summarize what we learned from each variable, and new questions?
Parida, B. R., Bar, S., Kaskaoutis, D., Pandey, A. C., Polade, S. D., & Goswami, S. (2021). Impact of COVID-19 induced lockdown on land surface temperature, aerosol, and urban heat in Europe and North America. Sustainable cities and society, 75, 103336. https://doi.org/10.1016/j.scs.2021.103336
NOAA National Centers for Environmental Information, Monthly Global Climate Report for Annual 2021, published online January 2022, retrieved on November 2, 2022 from https://www.ncei.noaa.gov/access/monitoring/monthly-report/global/202113
United States Environmental Protection Agency, Climate Change Indicators: U.S. and Global Precipitation, published online August 2022, retrieved on November 2, 2022 from https://www.epa.gov/climate-indicators/climate-change-indicators-us-and-global-precipitation